Beyond Average Scores: Identification of Consistent and Inconsistent Academic Achievement in Grouping Units

DIPS - 11/04/2024

Marwin Carmo

Introduction

  • Traditional educational research often fixates on average academic achievement.

  • Average performance and variability convey distinct information.

    • Consistent performance is positively correlated with student motivation and predicts more favorable long-term educational outcomes.
    • High variability can result in disproportionate representation of certain demographics at both ends of the achievement spectrum

Introduction

  • Consider two schools with a final average grade of 75% at the year’s end:
    • In one school, students had scores ranging from 50% to 100%.
    • In the other, they maintained a steady 75%.

Our goal

  • How can we identify schools with unusually high or low variability in academic achievement?

  • We adapt Mixed-Effects Location Scale Model (MELSM) incorporating a spike and slab prior into the scale component to select or shrink random effects.

  • Based on Bayes factors, we can decide on whether a school is consistent or inconsistent in its academic achievement.

The traditional MLM

  • Educational data is classically analyzed by means of multilevel models (MLM):
    • Schools are level-2 units and students’ test scores are the level-1 units.
  • Limitation: Assumes a fixed within-school standard variance, potentially masking important differences in variability that reflect underlying factors such as teaching quality, socioeconomic influences, or student engagement.

The traditional MLM

\[\begin{aligned} y_{ij} &= \gamma_0 + u_{0i} + \varepsilon_{ij}\\ u_{0i} &\sim \mathcal{N} (0, \tau_{u_0}^2)\\ \varepsilon_{ij} &\sim \mathcal{N} (0, \color{red}{\sigma_{\varepsilon}^2})\\ \end{aligned}\]

The MELSM framework

  • MELSM allows for the simultaneous estimation of a model for the means (location) and a model for the residual variance (scale).

  • Both sub-models are conceptualized as mixed-effect models.

G data Observed Dataloc Location Model(mean structure)data->loc scale Scale Model(variability structure)data->scale

The MELSM framework

\[\begin{aligned} y_{ij} &= \gamma_0 + u_{0i} + \varepsilon_{ij}\\ \sigma_{\varepsilon_{ij}} &= \exp(\eta_0 + t_{0j})\\ \end{aligned}\]

\[\begin{equation} \textbf{v}= \begin{bmatrix} u_0 \\ t_0 \end{bmatrix} \sim \mathcal{N} \begin{pmatrix} \boldsymbol{0}= \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \boldsymbol{\Sigma}= \begin{bmatrix} \tau^2_{u_0} & \tau_{u_0t_0} \\ \tau_{u_0t_0} & \tau^2_{t_0} \end{bmatrix} \end{pmatrix} \end{equation}\]

Advantages of MELSM

  • Residual variance it’s not merely random noise.

  • Accounts for possible correlations among location and scale effects.

  • Allows the inclusion of specific predictors in both sub-models.

    • For example, the variability of student performance within a school can be modeled as a function of parental socioeconomic status.

Spike and Slab MELSM

  • We incorporate the Spike-and-Slab prior as a method of variable selection of random effects in the scale model.

  • The model is allowed to switch between two assumptions:

    • A high probability to a common error standard deviation (\(\sigma\));
    • And another that captures school- and student-specific error variability (\(\sigma_\color{red}{ij}\))

The mechanism

\[\begin{equation} \color{lightgray}{ \textbf{v}= \begin{bmatrix} u_0 \\ t_0 \end{bmatrix} \sim \mathcal{N}} \begin{pmatrix} \color{lightgray}{ \boldsymbol{0}= \begin{bmatrix} 0 \\ 0 \end{bmatrix},} \boldsymbol{\Sigma}= \begin{bmatrix} \tau^2_{u_0} & \tau_{u_0t_0} \\ \tau_{u_0t_0} & \tau^2_{t_0} \end{bmatrix} \end{pmatrix} \end{equation}\]

  • To facilitate computation and the definition of priors, we decompose \(\boldsymbol{\Sigma}\) into \(\boldsymbol\Sigma = \boldsymbol{\tau}\boldsymbol{\Omega\tau}'\) , where
    • \(\boldsymbol{\tau}\) is a diagonal matrix of the random-effects standard deviations.
    • \(\boldsymbol{\Omega}\) is the correlation matrix among all random effects.

The mechanism

  • Next, we factorize \(\boldsymbol\Omega\) via the Cholesky \(\textbf{L}\) of \(\boldsymbol\Omega = \textbf{L}'\textbf{L}\).

\[\begin{equation} \label{eq:cholesky_approach} \textbf{L} = \begin{pmatrix} 1 & 0 \\ \rho_{u_0t_0} & \sqrt{1 - \rho_{u_0t_0}^2} \end{pmatrix} \end{equation}\]

  • If we multiply \(\textbf{L}\) by the random effect standard deviations, \(\boldsymbol{\tau}\), and scale it with a standard normally distributed \(\boldsymbol{z}\), we obtain \(\textbf{v}\)

\[\begin{equation} \textbf{v} = \boldsymbol{\tau}\textbf{L}\boldsymbol{z} \end{equation}\]

  • The Cholesky decomposition allows expressing the random effects in terms of the standard deviations and correlations

Variable selection

  • We include an indicator variable (\(\delta_{jk}\)) for each random effect to be subjected to shrinkage.

  • It allows switching between the spike and slab throughout the MCMC sampling process.

\[\begin{equation} \begin{aligned} u_{0j} &= \tau_{u_0}z_{ju_0}\\ t_{0j} &= \tau_{t_0}\left( \rho_{u_0t_0}z_{ju_0} + z_{jt_0}\sqrt{1 - \rho_{u_0t_0}^2} \right)\color{red}{\delta_{jt_0}} \end{aligned} \end{equation}\]

  • Each element in \(\boldsymbol{\delta}_j\) takes integers \(\in \{0,1\}\) and follows a \(\delta_{jk} \sim \text{Bernoulli}(\pi)\) distribution.

  • When a 0 is sampled, the portion after the fixed effect drops out of the equation.

\[\begin{equation} \label{eq:mm_delta} \sigma_{\varepsilon_{ij}} = \begin{cases} \exp(\eta_0 + 0), & \text{if }\delta_{jt_0} = 0 , \\ \exp(\eta_0 + t_{0j}), & \text{if }\delta_{jt_0} = 1 \end{cases}. \end{equation}\]

The Spike-and-Slab approach

Rouder et al. (2018)

If \(\delta= 0\), the density “spikes” at the zero point mass;

If \(\delta= 1\), the standard normal prior, \(z_{jk}\), is retained and scaled by \(\tau_k\), introducing the “slab”.

Posterior Inclusion Probability (PIP)

  • Quantifies the probability that a given random effect is included in the model, conditional on the observed data:

\[\begin{align} \label{eq:pip_theorical} Pr(\delta_{jk} = 1 | \textbf{Y}) = \frac{Pr(\textbf{Y} | \delta_{jk} = 1)Pr(\delta_{jk} = 1)}{Pr(\textbf{Y})} \end{align}\]

  • The PIP is estimated by the proportion of MCMC samples where \(\delta_{jk} = 1\):

\[\begin{align} \label{eq:pip} Pr(\delta_{jk} = 1 | \textbf{Y}) = \frac{1}{S} \sum_{s = 1}^S \delta_{jks} \end{align}\]

If there is evidence for zero variance in the scale random effects, the model reduces to the MLM assumption:

\[\varepsilon_{ij}\sim\mathcal{N}(0, \sigma_\varepsilon)\]

If not, the MELSM assumption of variance heterogeneity is retained:

\[\varepsilon_{ij}\sim\mathcal{N}(0, \sigma_\varepsilon_{ij})\]

  • The PIP gives us a probabilistic measure and does not perform automatic variable selection.
  • We estimate the strength of evidence through Bayes factors:

\[\begin{align} \label{eq:bf_pip} BF_{10j} = \frac{Pr(\delta_{jk} = 1 | \textbf{Y}) }{1 - Pr(\delta_{jk} = 1 | \textbf{Y}) } \end{align}\]

  • A BF\(_{10}\) > 3 corresponds to a PIP > 0.75. We are three times more likely to include this random effect.

Case study

  • We use a subset of data from the 2021 Brazilian Evaluation System of Elementary Education (Saeb).

  • It focuses on math scores from 11th and 12th-grade students across 160 randomly selected schools, encompassing a total of 11,386 students.

  • The analysis compares three SS-MELSM models with varying levels of complexity:

    • Model 1: Includes only fixed and random intercepts for both location and scale, without any predictors.
    • Model 2: Incorporates student and school socioeconomic status (SES) as covariates, but retains the two random intercept effects.
    • Model 3: Introduces a random slope for student-level SES within the scale portion of the model.

Software and estimation

  • The model was fitted using ivd package in R (Rast & Carmo, 2024).

  • All models were fitted with six chains of 3,000 iterations and 12,000 warm-up samples.

  • We computed the estimation efficiency using \(\hat{R}\) and the effective sample size (ESS).

  • The models were compared for predictive accuracy using Pareto smoothed importance sampling Leave-one-out cross-validation (PSIS-LOO).

Results

  • Model 1 identified eight schools with PIPs exceeding 0.75, suggesting notable deviations from the average within-school variance.

  • By incorporating SES covariates, Model 2 significantly outperformed Model 1 in terms of predictive accuracy, \(\Delta\widehat{\text{elpd}}_{\text{loo}}= -43.6 (10.5)\) .

  • Model 3 was practically indistinguishable from Model 2; the inclusion of a random slope for the student-level SES did not improve the model’s predictive accuracy, \(\Delta\widehat{\text{elpd}}_{\text{loo}}= -1.3 (0.6)\).

Discussion

  • The SS-MELSM helps identifying schools deviating from the norm in terms of within-school variability.

  • The spike-and-slab prior accounts for uncertainty in including random effects.

  • Identifying variability can guide resource allocation or teaching interventions.

Limitations

  • Currently, the SS-MELSM demands significant computational resources, especially with bigger datasets or more complex models.

  • It is still not clear how model performance is affected by the choice of hyperparameters.

  • Further development could explore the method’s performance in longitudinal data settings.